Comparing HMM, maximum entropy, and conditional random fields for disfluency detection
نویسندگان
چکیده
Automatic detection of disfluencies in spoken language is important for making speech recognition output more readable, and for aiding downstream language processing modules. We compare a generative hidden Markov model (HMM)-based approach and two conditional models — a maximum entropy (Maxent) model and a conditional random field (CRF) — for detecting disfluencies in speech. The conditional modeling approaches provide a more principled way to model correlated features. In particular, the CRF approach directly detects the reparandum regions, and thus avoids the use of ad-hoc heuristic rules. We evaluate performance of these three models across two different corpora (conversational speech and broadcast news) and for two types of transcriptions (human transcriptions and recognition output). Overall we find that that the conditional modeling approaches (Maxent and CRF) provide benefit over the HMM approach. Effects of speaking style, word recognition errors, and future directions are also discussed.
منابع مشابه
應用不定長度特徵之條件隨機域於口語不流暢語流修正 (Disfluency Correction of Spontaneous Speech using Conditional Random Fields with Variable Length Features) [In Chinese]
This paper presents an approach to detecting and correcting edit disfluency based on conditional random fields with variable-length features. The variable-length features consist of word, chunk and sentence features. Conditional random fields (CRF) are adopted to model the properties of the edit disfluency, including repair, repetition and restart, for edit disfluency detection. For the evaluat...
متن کاملUsing Integer Linear Programming for Detecting Speech Disfluencies
We present a novel two-stage technique for detecting speech disfluencies based on Integer Linear Programming (ILP). In the first stage we use state-of-the-art models for speech disfluency detection, in particular, hidden-event language models, maximum entropy models and conditional random fields. During testing each model proposes possible disfluency labels which are then assessed in the presen...
متن کاملUsing Conditional Random Fields for Sentence Boundary Detection in Speech
Sentence boundary detection in speech is important for enriching speech recognition output, making it easier for humans to read and downstream modules to process. In previous work, we have developed hidden Markov model (HMM) and maximum entropy (Maxent) classifiers that integrate textual and prosodic knowledge sources for detecting sentence boundaries. In this paper, we evaluate the use of a co...
متن کاملCross-Domain Speech Disfluency Detection
We build a model for speech disfluency detection based on conditional random fields (CRFs) using the Switchboard corpus. This model is then applied to a new domain without any adaptation. We show that a technique for detecting speech disfluencies based on Integer Linear Programming (ILP) (Georgila, 2009) significantly outperforms CRFs. In particular, in terms of F-score and NIST Error Rate the ...
متن کاملAcoustic Modeling Based on Deep Conditional Random Fields
Acoustic modeling based on Hidden Markov Models (HMMs) is employed by state-of-theart stochastic speech recognition systems. In continuous density HMMs, the state scores are computed using Gaussian mixture models. On the other hand, Deep Neural Networks (DNN) can be used to compute the HMM state scores. This leads to significant improvement in the recognition accuracy. Conditional Random Fields...
متن کامل